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Abstract 


The Jclinski-Moranda and Geometric models Tor software reliability failed die consistency test 
which we proposed. We challenged these models to lake data which comes from a process which they 
have correctly modeled and to make predictions about die reliability of dial process. We found dial 
either model, given data precisely from a process it correctly models, will usually fail to make good 
predictions. Wc attribute these problems to randomness in the dala used as input to the models and 
indicate a remedy for this lack of robustness, namely replication of data. 


Additional Key Words and Phrases: Growth Models, Software Reliability, Simulation, and Replication 


o. Introduction 


The Jelinski-Moranda [1] and the Geometric [2] arc famous and widcly-usctl models in die field 
of software reliability. These models assume the software being modeled is a Poisson Process with 
constant failure rate between two consecutive failures. Both models use the sequence of interfailure 
times from the debugging process to make maximum likelihood estimates of parameters associated with 
the models. These estimated parameter values arc then used to calculate estimates of reliability meas- 
ures such as MTTF of die debugged product. That is, dicy predict the future performance of die 
software based on the data from die debugging process. The Jclinski-Moranda model is often criticized 
for requiring tin identical failure rate for ail bugs but the Geometric mode! is not subject to this criti- 
cism. Wc will show that even if wc assume cither model correctly models reliability for a piece of 
software, wc still cannot expect good predictions from that model. Also, wc will demonstrate the 
benefits of replicated debugging as a remedy for this problem. It should also be noted dial all software 
reliability models potentially suffer from die same problem and those dial have this problem should 
benefit from replication. 


Both models arc intended to be used as prediction systems as described in Alxlcl-Ghaly, Chan 
and Litdcwood 13]. This paper characterizes a prediction system as consisting of three siagcs, namely a 
probabilistic model, a statistical inference procedure for estimating model parameters and a prediction 
procedure for predicting future inicrfailurc limes. All three components arc seen as critical to the pred- 
iction system. The problem wc address is in the second stage of the prediction process. 


'Iliis research was in part supported by NASA Gram 1*750. 

Special thanks arc due to G. li. Migucault whose ideas formed the starting point for this work. 
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i. Maximum Likelihood Equations 

The Jelinski-Moranda model assumes that there arc N initial bugs in the software, that each bug 
has the common failure rate of <(>, and dial the failure rate of the program is the number of bugs present 
multiplied by <J>. Thus, if i * 1 bugs have been removed, the failure rale is X t = (N-i+1) * <J>. If n 
errors have been removed then intcrfailure limes * * * ,t n have been generated and these may be 

inserted into the following likelihood equation which corresponds to litis model. 

Ul, ,l 2 . ■ ■ ■ 

;= i 

This likelihood equation may be maximized by letting N = N and <J) = <j> where N and § form the 
solution of the following equations. N and 0 arc estimators of N and 4> . 
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The Geometric model assumes that the failure rate after removing i-l bugs is Again 

tlic data of n intcrfailure times is inserted into lltc corresponding likelihood equation. 

L{t h t 2 , ■ ■ • ,r n ;a,P)=nia*p'- 1 *e" <1 * 15 ' '*'•] 

i= t 

This likelihood may be maximized by letting a = & and (3 = (3 where & and 3 form the solution to 
the following equations and arc used as estimators. 
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We show in following sections that neither of these models is robust. That is if you were to 
debug the same program twice, generating two sequences of intcrfailure times, then each model may 
give two very different estimates for its parameters. 
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2. Simulation of Interfailure Times 

Since these models each describe a Poisson Process with constant failure rate X,- , the probability 
that the next interfailure time is less than t is 1-e - . Thus we may obtain an interfailure time by 

generating an uniformly distributed random number r between 0 and 1 and solving r= for tj 

[4]. 

3 . Testing the Models 

A. Jelinski-Moranda Model tests 

We assumed a piece of software which has its reliability correctly modeled by Jelinski-Moranda, 
with parameters N and <|> fixed. Thus X^=N* <j> and we used the simulation process to generate /j, 
decreased Xj by $ to get X2 and simulated to get f 2 - After iterating n times we had data representing 
n interfailure times from one debugging run. 

The simulated interfailure pmes were used as input to the model and N and <j> were calculated. 
The predicted values of N and <j> usually differed greatly from the seeded values and there were large 
variations among the predictions from different simulations for the same seeded values. Each of the 
following histograms was constructed by generating 128 sets of interfailure times for each value of n 
with N fixed at 100 and <}> fixed at 0.001. The number 128 was chosen arbitrarily ^and appears to be 
large enough for our purposes. Each graph is a plot of the probability of N versus N . 

Figure l.b shows that for N = 100, <J> = .001 , and n = 30, N falls between 95 and 105 less 
than 5% of the time. For the same graph, N falls between 85 and 115 approximately 10% of the 
time. The other graphs tell a similar discouraging story. As expected, the best estimates are given by 
the case where n * 70, but even then only about 55% of the estimates are within 15 of 100. We also 
point out that since 70 errors have been removed, we are actually only trying to estimate the remaining 
30; thus our estimates are off by 50% or more 45% of the time. 

These results duplicate those of a simulation done by Joe and Reid [5]. Wc conclude that the 
model is very sensitive to random variations in the input data even when the data is precisely what the 
model says it should be. Thus, the Jelinski-Moranda model should not be used to make predictions 
about software based on a single sequence of interfailure times generated by one debugging run. 

B. Geometric Model tests 

The failure rate for the Geometric model is given by X,=Ot* P for each i. We assigned a = 
0.1 in order to have the same initial failure rate as that used in the Jelinski-Moranda tests. We chose (3 
= 0.8 as a compromise between the 0.95 which Moranda [2] found for a set of real data and the 0.2 to 
0.3 values which appear to be representative of the Nagel [6,7] and Dunham [8,9] data. Data was simu- 
lated using the changing X; values and the model was used to predict X n+1 , the failure rate of the pro- 
duct after n bugs have been removed. Once again the results of 128 repetitions for each value of n are 
represented by histograms. In these graphs the y-axis represent the probability of a prediction having a 
certain percentage of error relative to the correct value of X„ +1 . The values of n where chosen to be 
large enough to illustrate our point but small enough to avoid precision problems in the calculations. 

In these histograms the 0% bar represents the proportion of the estimates which fall within plus 
or minus 10% of X n+1 . For n = 10, only (approximately) 12% of the estimates come within 10% of 
the desired value. Also for n = 20, only (approximately) 46% of the estimates come within plus or 
minus 30% of the correct value. This indicates that the Geometric Model also has trouble handling 
the variations in the data from one debugging to the next. 




4. Replicated testing 
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Replicated debugging was introduced in Nagel [6,7] and was used with increased automation in 
Dunham [8,9]. Rather than attempt to summarize these papers here, we refer the interested reader to the 
originals. We do note that these papers used multiple programs written to satisfy a common 
specification in order to produce high quality replicated data which can be used to argue against the 
uniform failure rates of the Jelinski-Moranda model but appear to support the exponential decline in die 
failure rates associated with the Geometric model. We believe that replication is itself a very powerful 
and possibly necessary tool which has not yet been fully appreciated in the field of software reliability. 

By replicated debugging we mean the process of repeatedly debugging the same piece of software 
or more precisely the following. Given a piece of software, make r copies and debug each of them 
independently (except for shared fixes), removing the bugs from each replicate as they are discovered. 
For simulation purposes we chose to stop each replicate after generating an interfailure times sequence 
of length n, thus r replicates generated r sequences of interfailure times of the form 
for 1<= j <= r. Both Nagel and Dunham used random inputs from a known input distribution to gen- 
erate test cases and counted test cases between failures as die interfailure time. They did not however, 
remove a fixed number of bugs from each replicate but rather terminated each replicate after a fixed 
number of test cases or in some cases, for economic reasons, when a rare bug was encountered. Our 
simulation also represents situations where interfailure times arc measured in clock time or in calendar 
time. 


In order to get simulated data representing replicated debugging of a particular model, we 
assigned values to the necessary parameter and repeatedly simulated sequences of n interfailure times 
for that model until we had r such sequences. For case of calculation we reduced this r X n matrix of 
data to a single sequence of average interfailure times by letting 

ti ~ 2 — ~ ^ or 1<= i <= n. 

7=1 r 

When we repeated the tests for both models using interfailure data which was the average of r 
replications instead of from a single debugging, we observed that the models performed monotonically 
better as r increased. Each of the histograms in figures 3 and 4 was constructed by generating 128 sets 
of averaged interfailure times. These histograms when considered with those in the previous section 
indicate that the models require more than die normal debugging data in order to give good predictions 
and that they also show that replication offers a remedy. In particular, figure 3.d.l indicates that with 
30 replicates the Jelinski-Moranda model with n = 70 gives estimates between 95 and 105 about 
80% of the time and almost always gives estimates between 85 and 115. 
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5 . Confidence Intervals in Predictions 


If wc wish to quantify our confidence in the predictions of the models, then we can look at 
confidence intervals. If wc wish to be 90% certain that the estimate is within 10% of die value of the 
parameter wc arc trying to predict, then we can achieve this by increasing r or n. 

The following graph shows the (n,r) pairs which combine to give estimates within 10% of the 
value of N-n for the J cl inski- Moranda model with N = 100 and <J) = .(X)l. It is based on 2,500 repeti- 
tions for each (n,r) pair displayed. This graph shows dial good predictions arc possible for simulated 
data with replication and it also indicates that without replication one should not expect good predic- 
tions. 



Fig. 5 Confidence Interval Graph 


A similar graph could be constructed for die Geometric Model. It too would support die need for 
replication and show that by increasing either n or r one can obtain belter estimates, 'flic Nagel cxjicri- 
mcntal design generates replicated data efficiently by exploiting failure information and fixes discovered 
during Lhc normal debugging process. This process was automated by Dunham in order to gather repli- 
cated data even more efficiently. Further, this could run in the background or in parallel with the 
debugging effort, and thus it will be less expensive in both time and money to increase r rather than 
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6 . Summary 


Neither the Jelinski-Moranda model nor the Geometric model should be used to make predictions 
without replication. This does not guarantee that either model with replication will give good estimates, 
since die goodness of fit problem has not been addressed here and previous efforts to validate these 
models have not used replicated data and hence are suspect. It is clear from this work that random 
chance is likely to dominate if one uses only the data from a single debugging run. It is also claimed 
that the field of software reliability has been hindered by the random nature of the data and that replica- 
tion offers a solution to this problem by removing the randomness from the data. 
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